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Review of Probability Theory 


The focus of this course is on digital communication, which involves transmission of 
information, in its most general sense, from source to destination using digital technology. 
Engineering such a system requires modeling both the information and the transmission 
media. Interestingly, modeling both digital or analog information and many physical 
media requires a probabilistic setting. In this chapter and in the next one we will review 
the theory of probability, model random signals, and characterize their behavior as they 
traverse through deterministic systems disturbed by noise and interference. In order to 
develop practical models for random phenomena we start with carrying out a random 
experiment. We then introduce definitions, rules, and axioms for modeling within the 
context of the experiment. The outcome of a random experiment is denoted by w. The 
sample space £2 is the set of all possible outcomes of a random experiment. Such 
outcomes could be an abstract description in words. A scientific experiment should indeed 
be repeatable where each outcome could naturally have an associated probability of 
occurrence. This is defined formally as the ratio of the number of times the outcome 
occurs to the total number of times the experiment is repeated. 


Random Variables 


A random variable is the assignment of a real number to each outcome of a random 
experiment. 


X(@) 


Example: 

Roll adice. Outcomes w www iw w 
w,; = 12 dots on the face of the dice. 

X Wi 1 


Distributions 


Probability assignments on intervalsa X Ob 


Cumulative distribution 
The cumulative distribution function of a random variable X is a function 


x such that 
Equation: 
x b xX Ob 
w MXw Ob 
X(@) 
———— > 
X 
Q R 


Continuous Random Variable 
A random variable X is continuous if the cumulative distribution function can be 
written in an integral form, or 
Equation: 


and x 2 isthe probability density function (pdf) (e.g., x a is differentiable and 
x @ =e ED 


Discrete Random Variable 
A random variable X is discrete if it only takes at most countably many points (i.e., 
x is piecewise constant). The probability mass function (pmf) is defined as 
Equation: 


Two random variables defined on an experiment have joint distribution 


Equation: 


x y ab xX 


Joint pdf can be obtained if they are jointly continuous 


Equation: 
b 
x ya b 
xX YrY 
(e253 xy “Yy =a 


Joint pmf if they are jointly discrete 
Equation: 


XY rk Yl 


Conditional density function 
Equation: 


fyx yt 


X «Ys yi 


XY ZY 


xX @& 


forall x with x & otherwise conditional density is not defined for those values of x 
with xy 2& 


Two random variables are independent if 


Equation: 
xy vy x t yy 
for all x and y . For discrete random variables, 
Equation: 
XY &k Yl X Te Y Yl 
for all k and l. 
Moments 


Statistical quantities to represent some of the characteristics of a random variable. 
Equation: 


g X EgxX 
Oe k- -£ 


RG Uk xX Fk 


e Mean 
Equation: 
bx xX 
e Second moment 
Equation: 
KX xX 
e Variance 


Equation: 


xX o X 


X px 
x Ux 
e Characteristic function 
Equation: 
Py u eux 
for u , where 2 
e Correlation between two random variables 
Equation: 
Ryy XY 


TY xy LY cy 
k ~eey, XY Lk Yl 


e Covariance 
Equation: 


Ce ey 


Rxy pexby 
e Correlation coefficient 
Equation: 
xX Y 
PXY 
OxXxOy 


Uncorrelated random variables 
Two random variables X and Y are uncorrelated if pxy 


Introduction to Stochastic Processes 


Definitions, distributions, and stationarity 


Stochastic Process 
Given a sample space, a stochastic process is an indexed collection of random variables defined for each 
we 92. 
Equation: 


Vt,te  : (X;(w)) 


Example: 
Received signal at an antenna as in [link]. 
Sample Paths 


For a given t, X;(w) is a random variable with a distribution 
Equation: 
First-order distribution 


Fx,(b) = Pr[X; < } 
Pri{w € 2| Xz(w) < b}] 


First-order stationary process 
If F'x,(0) is not a function of time then X; is called a first-order stationary process. 


Equation: 
Second-order distribution 


Ee ae (b1, ba) = Pr[ Xz, < by, Xt, < bo] 


forallti € ,t2€ ,b1,€ , bo © 
Equation: 


Nth-order distribution 


Pe este (ig Dds ed ., by) = Pr[ Xz, < bi, ene »» Xty < by] 


Nth-order stationary : A random process is stationary of order N if 


Equation: 


Fi Ky son Mee (OT) bo,...,bn) = PE sot Kes Kea Diy bo,..., bn) 


Strictly stationary : A process is strictly stationary if it is Nth order stationary for all NV. 


Example: 


X, = cos(2m fot + O(w)) where fo is the deterministic carrier frequency and O(w) : 2 > 


variable defined over |—7,, 7] and is assumed to be a uniform random variable; i.e., 


ibe te 
fol) = if 0 € [—z, 7] 


0 otherwise 


Pr[X; <q b| 


Pricos(27fot + O) < bj 


is arandom 


F,(b) = Pri[—a < 2nfot + O < —arccos(b)| + Priarccos(b) < 2afot + O <n] 


Equation: 
Fx,(b) = 

Equation: 

Equation: 
= (— arccos(b) 
BNO 

Equation: 


fx,(z) 


This process is stationary of order 1. 


)-2rfot x d O+ 


(2m — 2 arccos(b)) + 


arccos(b) 


7 1 — = arccos(z) 
wes if lal st 


0 otherwise 


m—2n fot i dé 


—2nfot I 


Plots of Cosines with different Phases and the same frequency 


(AVAVAVAVAN 


IWAVAVAVAVA) 


4 " 1 
0 20 40 60 80 100 #120 140 160 180 200 


The second order stationarity can be determined by first considering conditional densities and the joint 
density. Recall that 
Equation: 


X; = cos(2rfot + O) 


Then the relevant step is to find 


Equation: 
Pree ab) | — 
Note that 
Equation: 
(Xi, = x1 = cos(2rfot + O)) > (O = arccos(x1) — 27 fot) 
Equation: 


Xt, = cos(2rfot2 + arccos(x1) — 2a fot1) 
cos(2m fo (t2 — t1) + arccos(x1)) 


Pr[X,,<b,| X,=x, ] 


cos(2Th(t, - ty) + cos X,) - 


Equation: 


by 
Fx,,,x,, (b2, b1) = fx,,(@1) Pr[Xt, < be | Xt, = 21] d x1 


Note that this is only a function of t2 — f. 


Example: 
Every T seconds, a fair coin is tossed. If heads, then X; = 1 fornT <t < (n+ 1)T. If tails, then 
X, = —lfornT <t < (n+1)T. 


X, f Sample function 


Equation: 


itec—ol 


46), = 


NR pele 


forallt € .X;, is stationary of order 1. 
Second order probability mass function 
Equation: 


PX,,X,,(€1,€2) = Px, x, (v2|1)pPx,, (£1) 


The conditional pmf 
Equation: 
0 if L2 z= Ly 


Px, X,, (#2|@1) a il iit aes) = ae 


when nT < t) < (n+ 1)T and nT < tz < (n+ 1)T for some n. 
Equation: 
Px,, X,,(€2|@1) = px, (22) 


for all x; and for all 2 when nT < t; < (n+1)T andmT < tz < (m+1)T withhn#m 
Equation: 
0 if r2 A axifor nT < ti, te < (n+1)T 
px,,x,, (@2, @1) = px, (t1) if 2 = afor nT < ti, te < (n+1)T 
Px, (©1)px,,(a2) if n A mfor (nT <t) <(n+1)T) A (mT < ty < (m+1)T) 


Second-order Description 


Second-order description 


Practical and incomplete statistics 


Mean 
The mean function of a random process X; is defined as the expected value of 
X; for all t's. 


Equation: 
Le a. E [x t| 
_ fers zfy,(z) dz if continuous 
Yon oo Lk PX, (Lx) if discrete 
Autocorrelation 
The autocorrelation function of the random process X; is defined as 
Equation: 


Rx(te,ti) = E XX, 


CO CO = . ° 
_ fo. 271 fx,,,X,, (€2,%1) dz, d x2 if continuous 


SE —oo ae LITk DX1.Xi, (x1,x%) if discrete 


Fact 


If X; is second-order stationary, then R.x(t2,t1) only depends on tz — £4. 
Equation: 


Rx(te,ti) = E X1,Xi, 
tie Nite @X1 fx, x, (z2,21)da,d 2 


Equation: 


Rx(to,ti) = foo fo 22% £x,,_.,,X (#2, 1) d a2 d xy 


Rx(t — t1,0) 


If R x(ta, ty) depends on ty — ft; only, then we will represent the autocorrelation 
with only one variable T = ty — ty 


Equation: 
Rx(rt) = Rx(to = ty) 
= Rx(te,t1) 
Properties 
1. Rx(0) >0 
2. Rx(r) = Rx(-7) 


Example: 

X;, = cos(2rfot + O(w)) and O is uniformly distributed between 0 and 27. The 
mean function 

Equation: 


px(t) = EX: 
= E|cos(2rfot + O)| 
fee cos(2rfot + 0) =~ dé 


| 
= 


The autocorrelation function 
Equation: 
Rx(t+7,t) = E Xi-Xt 
= Elcos(27fp (t+ 7) + O) cos(27fot + O)| 
= 1/2E|cos(27for)| + 1/2E|cos(27fp (2t + 7) + 20)] 
= 1/2cos(2rfor) + 1/2 jee cos(2mfo (2t +7) + 20)=- do 
= 1/2cos(27for) 


Not a function of ¢ since the second term in the right hand side of the equality in 
[link] is zero. 


Example: 

Toss a fair coin every T’ seconds. Since X; is a discrete valued random process, the 
statistical characteristics can be captured by the pmf and the mean function is 
written as 


Equation: 
wx(t) = EX] 
= 1/2x-1+1/2x1 
0 
Equation: 


Rx(to,t1) = Dig Ly TeeIPX,,,X,, (Lk, L1) 
= berxi2 1-12 
1 


when nT < t; < (n+ 1)T and nT < tz < (n+ 1)T 
Equation: 


Rx(t2,t1) = 1x1x1/4—1x-1x1/4—1x1x1/44+1x-1x1/4 
= 0 


when nT < ty < (n+1)T and mT < tz < (m+1)T withhn 4m 
Equation: 
Lif (nT <t, <(n+1)T) A (nT < te < (n+ 1)T) 


Rx(to,t1) = 
x(t2, 1) 0 otherwise 


A function of t; and to. 
Wide Sense Stationary 


A process is said to be wide sense stationary if x is constant and Rx (to, t1) 
is only a function of tz — ¢}. 


Fact 


If X; is strictly stationary, then it is wide sense stationary. The converse is not 
necessarily true. 


Autocovariance 
Autocovariance of a random process is defined as 
Equation: 


Cx(ta,t1) = E (Xt, — wx(te)) Xt, — wx(ts) 


= Rx(ta,ti) — wx(te)ux(tr) 


The variance of X; is Var (Xz) = Cx(t,t) 


Two processes defined on one experiment ([link]). 


X; Y, 
Crosscorrelation 
The crosscorrelation function of a pair of random processes is defined as 
Equation: 
Rxy(t2,t1) = E XY, 
SLE ey ix, y, (ey) dady 
Equation: 


Cxy (te, ti) = Rxy(te, t1) — wx(te) uy (t1) 


Jointly Wide Sense Stationary 
The random processes X; and Y; are said to be jointly wide sense stationary if 
Rxy (te, t1) is a function of t2 — t; only and x(t) and y(t) are constant. 


Linear Filtering 


Equation: 
Integration 
b 
Rh2 / Moai 
Equation: 
Linear Processing 
CO 
a / h(t,7r)X, dr 
Equation: 
Differentiation 
d 
X;'’ = —(X 
mer ce) 
Properties 


LZ= /X(w)dt= ?ux(t)dt 


2:27 = Seley ee dii = 


Equation: 


°° Rx(to,t1) d ti d te 


py) = = Ar) Xpdr 
= = Altyrux(s) a7 


—CoO 


If X; is wide sense stationary and the linear system is time invariant 
Equation: 


py (t) _ = h(t —T)ux dt 
= px h(t’) dt! 
Equation: 
Ryx(ta,t1) = YinXe, 

= © h(te—7T)X,d7Xt, 

= aes h(t2 — T)Rx(t — t1) dt 
Equation: 

Ry x(t, t1) _ es h(t ba — T')Rx(r’) aa 


= h*Rx(te = t1) 


where 7’ = T — 14. 
Equation: 


Ry(te,ti1) = Y¥2Yy 


Ye oti) ede 
= es h(t,,7)Ryx(te, T) dt 


(ee) 


= es h(t; —7T)Ryx(te —T) dr 


2) 


Equation: 


Ry (te, t1) = - h(t’ as (to = t1))Ryx(r') d 7! 


in 
% B 
es 
a 
= 

oT 
ae ye 


where 7’ = tg — randh(r) = h(—7r) forallt € .Y; is WSS if X; is 
WSS and the linear system is time-invariant. 


Example: 

X; is a wide sense stationary process with wx = 0, and Rx(r) = 7 § (7) 
. Consider the random process going through a filter with impulse response 
h(t) = e- (u(t). The output process is denoted by Y;. y(t) = 0 for all 

cL. 

Equation: 


Ry(r) =  @ A(a)h(a—T)da 


—oo 
No e7 (irl) 
a Es ae 


X; is called a white process. Y; is a Markov process. 


Power Spectral Density 


The power spectral density function of a wide sense stationary (WSS) 
process X; is defined to be the Fourier transform of the autocorrelation 
function of X;. 

Equation: 


sx(f)= | © Rx(r)e Fd 


if X; is WSS with autocorrelation function Rx(r). 
Properties 


1. Sx(f) = Sx(—f) since Rx is even and real. 
2, Var (Xi) = Rx(0)= =. Sx(f)d Ff 
3. Sx(f) is real and nonnegative Sx(f) > 0 for all f. 


IfY,= “ h(t —7)X,d7 then 
Equation: 

Sy(f) =  (Ry(7)) 
= (A h*Rx(r)) 
= H(f)A(f)Sx(f) 
= (|H(f)|)"Sx(f) 


Y 
* 


since H(f)= % A(t)e@"!) dt = H(f) 


Example: 
X; is a white process and h(t) = e (“) u(t). 
Equation: 


Equation: 


ENG 


Sy(f) = dg fP 


Gaussian Processes 


Gaussian Random Processes 


Gaussian process 
A process with mean jx (t) and covariance function C'x(t2, t1) is said 
to be a Gaussian process if any X = (X;,,Xt,...Xty) formed by 
any sampling of the process is a Gaussian random vector, that is, 


Equation: 
fx(a) — a es. 4 (e@—px)’ Sy (a—px) 
1 
(2m)? (det Sx)? 
for alla € ™ where 
ux(t1) 
Lx = : 
ux(tn) 
and 
Cx(t1,t1) ... Cx(ti,tn) 
ux — . . 
Cx(tn, t1) soe Cx(ty, tn) 


. The complete statistical properties of X; can be obtained from the 
second-order statistics. 


Properties 


1. If a Gaussian process is WSS, then it is strictly stationary. 

2. If two Gaussian processes are uncorrelated, then they are also 
Statistically independent. 

3. Any linear processing of a Gaussian process results in a Gaussian 
process. 


Example: 
X and Y are Gaussian and zero mean and independent. Z = X + Y is 
also Gaussian. 


Equation: 
(cee 

a a “7 OX 
for all u € 
Equation: 

eas Oe nae 2 
haley ey: 
ee Re ee 


therefore Z is also Gaussian. 


Data Transmission and Reception 


We will develop the idea of data transmission by first considering simple 
channels. In additional modules, we will consider more practical channels; 
baseband channels with bandwidth constraints and passband channels. 
Simple additive white Gaussian channels 


Channel 


X; carries data, NV; is a white 
Gaussian random process. 


The concept of using different types of modulation for transmission of data 
is introduced in the module Signalling. The problem of demodulation and 
detection of signals is discussed in Demodulation and Detection. 


Signalling 


Example: 


Data symbols are "1" or "0" and data rate is - Hertz. 
Pulse amplitude modulation (PAM) 


X 
A Modulated 
i 


Data ‘* 
A 
0-” 
-A 


Pulse position modulation 


X 
A Modulated 
x, i 
ata 


A — 


D 


Example: 
Example 
Data symbols are "1" or "0" and the data rate is 2 Hertz. 


00 ———> 


a T 
01 ———>- 4 
t 
x T 
t 
10 ————> T 
-A 
X, 
t 
T 


This strategy is an alternative to PAM with half the period, f. 


00 ———> 


x T 
01 ———>- 4 
t 
es T 
t 
10 ————> T 
-A 
< 
t 
T 


Relevant measures are energy of modulated signals 
Equation: 


[= i ene Be 8m7(t) dt 
0 


and how different they are in terms of inner products. 


Equation: 


T 


(are Sa ie Sm(t)Sn(t) dt 


form-< 11.2.2... and ne 41,2... t. 


antipodal 
Signals s1(t) and s2(t) are antipodal if 
Vt, t € [0, T] : (s2(t) = —s1(t)) 


orthogonal 
Signals s;(t), S9(t),..., $,¢(t) are orthogonal if (s,,,, 8,,) = 0 for 
mn. 

biorthogonal 
Signals s(t), s2(t),..., a(t) are biorthogonal if s1(t),..., Su (t) are 
orthogonal and s,,(t) = — $M 4m(t) forsomeme€ 1,2,..., oe 


It is quite intuitive to expect that the smaller (the more negative) the inner 
products, ($m, $n) for all m # n, the better the signal set. 


Simplex signals 
Let {s1(t), 52(t),..., $ac(t)} be a set of orthogonal signals with equal 
energy. The signals s;(t),..., $,¢(¢) are simplex signals if 
Equation: 


If the energy of orthogonal signals is denoted by 
Equation: 


Vm,m € {1,2,..,M}: EH, = Sm*(t) dt 


then the energy of simplex signals 


Equation: 
1 
E;= | = E, 
M 
and 
Equation: 
Ym £n: (88) = ————E 
MEN: (Sm )8n) = = —#; 
M-1 


It is conjectured that among all possible M-ary signals with equal energy, 
the simplex signal set results in the smallest probability of error when used 
to transmit information through an additive white Gaussian noise channel. 


The geometric representation of signals can provide a compact description 
of signals and can simplify performance analysis of communication systems 


using the signals. 


Once signals have been modulated, the receiver must detect and demodulate 
the signals despite interference and noise and decide which of the set of 


possible transmitted signals was sent. 


Geometric Representation of Modulation Signals 


Geometric representation of signals can provide a compact characterization 
of signals and can simplify analysis of their performance as modulation 
signals. 


Orthonormal bases are essential in geometry. Let {s1(t), 52(t),..., sac(t)} 
be a set of signals. 


Define ~1(t) = ah where Fy, = . s17(t) dt. 
1 


Define 831 = (8, U1) = et 82(t)1(t) d t and 
q(t) = — (s2(t) — $2191) where Ey = FS (so(t) — sa1y1(t))* dt 


EF 


In general 
Equation: 


T k-1 f 
where Ex = 9 Sx(t)— 5 1 Saji (t) dt. 


The process continues until all of the V/ signals are exhausted. The results 
are N orthogonal signals with unit energy, {~1(t), wo(t),..., bw(t)} 
where NV < M. If the signals {s1(t),..., $a¢(t)} are linearly independent, 
then VN = M. 


The WM signals can be represented as 
Equation: 


with m € {1,2,..., M} where sin = (8m, Wn) and En, = > i Cie 


Sml1 
Sm2 
The signals can be represented by s,, = 
SmN 
Example: 
S(t) 
A 
t 
S,(t) ih 
t 
‘it 
-A 
Equation: 
S1 (t) 
v(t) = 
A2T 
Equation: 
$11 = AVT 
Equation: 


Equation: 


bo(t) = (s2(t) — sorbi(t)) 


| 


AVT 1 
Beans eae 


Dimension of the signal set is 1 with £,; = $117 and Ey = 891’. 


Example: 
aK s(t) 


h_io_fo io 


Wm(t) = sit where EF; = 5 ee) AS ae 
VE, 0 0 0 
al = > 82 => VE; is a and s4 = 0 
i 0 VE; 0 
: : 0 ak 


Equation: 


nie Shee (Sra = Sj)? = a2. 


is the Euclidean distance between signals. 


Example: 
Set of 4 equal energy biorthogonal signals. s(t) = s(t), s2(t) = s(t), 
s3(t) = —s(t), s4(t) = —s+(t). 

The orthonormal basis #;(t) = =, w(t) 


E, — - Sch) (aye 


ee 0 =Ah 0 

1g — > 83 = VE; > 54 = a he 
0 ay Ee 0 = I5, 

four signals can be geometrically represented using the 4-vector of 

projection coefficients $1, $2, $3, and sq as a set of constellation points. 


Signal constellation 


p,(t) 


_ s(t) 


— — where 
VE; 


Equation: 


V2E, 

Equation: 
dig = do 
= dz, 
dy4 

Equation: 

dine — a |Si—s 3 

2/Es 

Equation: 
di3 = dog 


Minimum distance dinin = /2E; 


Demodulation and Detection 


Consider the problem where signal set, {51, $2,..., si}, fort € [0, T] is 
used to transmit log, M bits. The modulated signal X; could be 
{81,S2,..-, $m} during the intervalO <t < T. 


re = Xi + Ne = Sy(t) + M for0 <t < T for 
some m € {1,2,..., M}. 


Recall $m(t) = S>_, 8mnWn(t) form € {1,2,..., M} the signals are 
decomposed into a set of orthonormal signals, perfectly. 


Noise process can also be decomposed 
Equation: 


N 
Np = So tnnlt) + M, 
n=1 


where , = f, i Niwn(t) d t is the projection onto the n*® basis signal, N; 
is the left over noise. 


The problem of demodulation and detection is to observe r; for 

0 <¢ < T and decide which one of the M signals were transmitted. 
Demodulation is covered here. A discussion about detection can be found 
here. 


Demodulation 


Demodulation 


Convert the continuous time received signal into a vector without loss of 
information (or performance). 


Equation: 
Tt = Sm(t) + WM 
Equation: 
N N a: 
rt = a SmnWn(t) si Se NnWn(t) ae Ni 
n=1 n=1 
Equation: 
N —_—_ 
Li S° (Sin ae Nn) Yn(t) ag Ni 
n=1 
Equation: 
N 
r= So rntn(t) +N; 
n=l 


The noise projection coefficients 7,,'s are zero mean, Gaussian random 
variables and are mutually independent if NV; is a white Gaussian process. 
Equation: 


[,(n) = Elm] 
= Bl fy Netn(t) at] 


Equation: 


n(n) = fo ELNildnl(t) dt 


= 0 
Equation: 
Elnena| = Bl fo Nede(t) dt fo Nerbi(t’) dt] 
= fo So NeNeds(t)yn(t!) dt de’ 
Equation: 
T T 
Elm] = ff Ry(t—e alton dead 
Equation: 


Bim] = ff st—t)vultvnle)) tad 


2 
Equation: 
Elm] = “a So Pe(t)Pal(t) at 
ae * Sen 
* if k=n 
OifkAn 


7, 'S are uncorrelated and since they are Gaussian they are also 


independent. Therefore, 7, ~ Gaussian (0, *) and R,(k,n) = Sen 


The r,,'s, the projection of the received signal r; onto the orthonormal bases 
W(t)'s, are independent from the residual noise process N;. 


The residual noise JV; is irrelevant to the decision process on 7;. 


Recall rn = 8mn +n; given $(t) was transmitted. Therefore, 
Equation: 


Ur(n) = Elsmn +n] 


Smn 


Equation: 


The correlation between 7,, and N; 


Equation: 
xe N 
B|Nirn| =E  Ne— So medalt) Sm +1 
Equation: 
B|Nir,| =~EKN- S > meve(t) Smn 1+ E| nen] — SOE kn del t) 
k=1 k=1 
Equation: 
pas f “No 
E|Nr,|=E M / Nerbn(t!) dt! — S$) Simbel) 
0 k=1 
Equation: 


— 


B|Nrn| = [ salt —t/)dn(t’) dt! — dal) 


Equation: 


B|Nira| = a(t) — Boalt) 
= 0 


Since both AN; and r,, are Gaussian then N; and r, are also independent. 
ry 
The conjecture is to ignore N; and extract information from 


TN 
Knowing the vector we can reconstruct the relevant part of random 
process r; forO <t<T 
Equation: 


re = Sm(t) +M 
= ra rndn(t) +N 


S> 


Detector 


Once the received signal has been converted to a vector, the correct 
transmitted signal must be detected based upon observations of the input 
vector. Detection is covered elsewhere, 


Detection by Correlation 
Demodulation and Detection 


3> 


Detector 


Detection 


Decide which s,,(t) from the set of {s1(t),..., $m(t)} signals was 
1 


2 
transmitted based on observing = _ , the vector composed of 


N 
demodulated received signal, that is, the vector of projection of the received 
signal onto the N bases. 
Equation: 


m =arg max Pr's,,(t) was transmitted | was observed] 
1l<m<M 


Note that 
Equation: 


Fri m Pr| 
cs 


Pr[s,, | |  Pr| .(t)was transmitted | was observed] = 


If Pr| ,, was transmitted] = Ta that is information symbols are equally 
likely to be transmitted, then 
Equation: 


arg max Pr —arg max 
g max [ml | g max f |. 


Since r(t) = s,,(t) + N; for 0 < t < T and for some m = {1, 2,..., MW} 
1 
2 


then = »+ where = ; and__,,'s are Gaussian and independent. 
N 
Equation: 
1 Th on 
= 2 
No 2 
pi. 
Equation: 
m = arg max f 
1<m<M | m 


= arg max In 
Bmax jae 


= N 1 N ) 
_ Oem - > In(aNo) ae 7 Al ( = Sih) 
= arg min N ft aS Ben)" 

1<m<M 


where D( , m) is the Jz distance between vectors and ,, defined as 
N 2 
D( ’ ad) nil ( no Sm,n) 


Equation: 


m = arg min | DU 4 a) 
_ ; 2 2 
= arg, min (| = 2C& nd EU) aa: 
where || || is the 2 norm of vector defined as || __ || - rat ae 
Equation: 


m =arg max 2{(, m))— (| m I)” 


This type of receiver system is known as a correlation (or correlator-type) 
receiver. Examples of the use of such a system are found here. Another type of 
receiver involves linear, time-invariant filters and is known as a matched filter 
receiver. An analysis of the performance of a correlator-type receiver using 
antipodal and orthogonal binary signals can be found in Performance Analysis. 


Examples of Correlation Detection 


The implementation and theory of correlator-type receivers can be found in 
Detection. 


Example: 


im = 2since D(r, 81) > D(r, 82) or (|| 81 ||)? = (|| s2 ||)? and 
(r, 82) > (1, $1). 


Example: 
Data symbols "0" or "1" with equal probability. Modulator s;(t) = s(t) for 
OS =F and: 45(6) = —s (tito 0 


S(t) 
A 
t 
S,(t) T 
t 
T 
-A 
ny (t) = — si, — Ant and so; = — AVT 
Equation: 


Vm,m = {1,2}: (r¢ = Sm(t) + M) 


Equation: 
a AVT + 


or 
Equation: 


i oa AVT + 


: N 
7m is Gaussian with zero mean and variance =~. 


I 


I, 


m =argmax AVTr;,— AVTr,  ,since A/T > Oand 


Pr[s;] = Pr|s;] then the MAP decision rule decides. 
s(t) was transmitted if r; > 0 

S(t) was transmitted if r; < 0 

An alternate demodulator: 

Equation: 


(rt = Sm(t) + Ni) > (7 = 8m +7) 


Matched Filters 


Signal to Noise Ratio (SNR) at the output of the demodulator is a measure 
of the quality of the demodulator. 

Equation: 

signal energy 


SNR = ; 
noise energy 


In the correlator described earlier, Z, = (|8m|)” and i= Ao Is it 
possible to design a demodulator based on linear time-invariant filters with 
maximum signal-to-noise ratio? 


3> 


Detector 


If $:(t) is the transmitted signal, then the output of the k'® filter is given as 
Equation: 


ie r-hy(t —T) dT 


= _3,(8m(7) + Nr)he(t — 7) dt 
= i? Sm(T)hy(t —7) d+ he N,hi(t — 7) dT 


yx(t) 


Sampling the output at time 7’ yields 
Equation: 
(o@) 2) 


yx(T) = Sm(T)hy(T — 7) d 7+ N,hi(T — 7) a7 


—0o —0o 


The noise contribution: 
Equation: 


VE = N,hy(T — 7) dr 


The expected value of the noise component is 
Equation: 


Ely] = E ~*~ N,hy(T—7) dr 
= 0 


The variance of the noise component is the second moment since the mean 
is zero and is given as 
Equation: 


o(v%)? = E v;? 


E oS Nrha(T — 7) dt nN hy(L—7 )dr 


Equation: 


Evy2 = ~ © M§r7—7 Ay(T—r)hi(T—7)drdr 


—co -oo 2 


=  %(lae(T — 7)" dr 


2 —oo 


Signal Energy can be written as 
Equation: 


eee 


—COo 


and the signal-to-noise ratio (SNR) as 
Equation: 


oo 2 
Sm(T)he(LT —7) dr 
eur = ce tm(T)h(T— 7) 


8 (\ha(T —7)|)? dr 


The signal-to-noise ratio, can be maximized considering the well-known 
Cauchy-Schwarz Inequality 


Equation: 
gi(tjgo(a)de < = (lgi(x)|)"de  — (|g2(a)|)" da 


with equality when gi(x) = ag2(x). Applying the inequality directly 
yields an upper bound on SNR 
Equation: 


© 8m(t)he(T — 7) dr” so 


to (\ae(T—7)|)° dt No cx 


(|8m(r)|)° 47 


with equality Vr: h,?*(T — 1) = asm(r) . Therefore, the filter to 


examine signal m should be 
Equation: 
Matched Filter 


Vr: AP (r) = s,(T — 7) 


The constant factor is not relevant when one considers the signal to noise 
ratio. The maximum SNR is unchanged when both the numerator and 
denominator are scaled. 

Equation: 


—— —— (|8m(r)|)" dz = —* 


Examples involving matched filter receivers can be found here, An analysis 
in the frequency domain is contained in Matched Filters in the Frequency 
Domain. 


Another type of receiver system is the correlation receiver. A performance 
analysis of both matched filters and correlator-type receivers can be found 
in Performance Analysis. 


Examples with Matched Filters 


The theory and rationale behind matched filter receivers can be found in 
Matched Filters. 


Example: 
s(t) h7(t) 
T jt 
s,(t) h3(t) 


Silty) = tle e IU 

(j= —=tio0=t= 7 
ite ot (One tae 
ho(t) = —-T+tfor0<t<T 


S(T) 


Equation: 


VO esG 2d ees Mt) si(T)hi(t —7T) dr 
Equation: 
s(t) = 47(T-t+7)dr 
= er— yr? 54 dot 
2 
= 285 
Equation: 
T3 
OD re 


Compared to the correlator-type demodulation 
Equation: 


t) = 
vi(t) VE. 
Equation: 
a 
eS itleren si(7)pi(r) dt 
0 
Equation: 
9 si(T)vi(T) dtr = ee 5 TT drt 
les 


-- 5 (t) 
Matched Filter 
output 


Correlator output «Es 


Example: 

Assume binary data is transmitted at the rate of + Hertz. 
OF (Ge Si eh (on aera 
(be (soe ob ctorm Oia ge 
Equation: 


5, (0) 


6T 


5T 


2T 


indjno 
JoyelaLI0D 


Matched Filters in the Frequency Domain 


The time domain analysis and implementation of matched filters can be 
found in Matched Filters. 


A frequency domain interpretation of matched filters is very useful 
Equation: 


(f°, 8m(T)hm(T — 7) dr)” 


Efe (em (F — 7)I)? dz 


SNR = 


For the m-th filter, h,, can be expressed as 
Equation: 


Sil) = [2 ealDhalf 7d? 


= (Amn f)Sm(Ff)) 
= fe Am(f)Sm(fle*™ d f 


where the second equality is because §,,, is the filter output with input S,, 


and filter H,, and we can now define Hp (f) = Hm(f)e (?™” , then 
Equation: 


The denominator 
Equation: 


Equation: 


hin*ha(0). = [> (hal(f)l) df 


Equation: 
Ain*Rm(0) = (oe Hane hale df 
= Hn(f),Hmn(f) 

Therefore, 
Equation: 

Sm(fsHm(f) og 

SNR = ———_"___ « —_ ((S(f), Sm(#))) 
~  Anlf), Hm(f) ° 


with equality when 


Equation: 
Aimn(f) = aSim(f) 
or 
Equation: 
Matched Filter in the frequency domain 
Hf) = Sn( fie Or 
Matched Filter 


Equation: 


Sm(t) = ~—* sm(f)sm(f) 
= { (eather as 
= [© ([8m(f)|)* cos(2rft) d f 


where  ~? is the inverse Fourier Transform operator. 


Performance Analysis 


In this section we will evaluate the probability of error of both correlator 
type receivers and matched filter receivers. We will only present the 
analysis for transmission of binary symbols. In the process we will 
demonstrate that both of these receivers have identical bit-error 
probabilities. 


Antipodal Signals 


Tt = Sm(t) + MN; forO < t < T with m = 1 andm = 2 and 
$1(t) = — 89(t) 
An analysis of the performance of correlation receivers with antipodal 


binary signals can be found here. A similar analysis for matched filter 
receivers can be found here. 


Orthogonal Signals 
Tt = Sm(t) + N; forO < t < T with m = 1 and m = 2 and (81, 52) = 0 


An analysis of the performance of correlation receivers with orthogonal 
binary signals can be found here. A similar analysis for matched filter 
receivers can be found here. 


It can be shown in general that correlation and matched filter receivers 
perform with the same symbol error probability if the detection criteria is 
the same for both receivers. 


Performance Analysis of Antipodal Binary signals with Correlation 


The bit-error probability for a correlation receiver with an antipodal signal 
set ([link]) can be found as follows: 
Equation: 


P, = Prim Aim 
—= Prbszb 
= J) Pr Ty eo ee ay Pr ry 2 Waa 


oY 


= TY aes Ftd) (r)dr+m ne oe) (r) dr 


if m9 = 71 = 1/2, then the optimum threshold is y = 0. 


Equation: 
— No 
f,s|51(t) (r) = ce) 2) 
Equation: 
— No 
f,+|50(t) (r) = ~~ E,, 2) 


If the two symbols are equally likely to be transmitted then 79 = 71 = 1/2 
and if the threshold is set to zero, then 
Equation: 


0 1 r—/Es 2 co 1 _ r+V/Es 
Poa l2 e ™ dr+1/2 e  % 
- N 0 N 
Oa an 
Equation: 
_ ae 1 at 2 lore) 1 _ pl 2 
P.=1/2 e 2 dr +1/2 —e 2? dr” 
—oo V/ 20 Ne / 20 
with r! = 22V2s and rp! — ttvEs 
ON Ng 
2 2 
Equation: 
a al 2B, 4 1 2B, 
Po = 2 Q No + 2 Q No 
2B, 
= Q N; 
age 
where Q(b) = ,~ i e? da. 
Note that 
————|———————— > 
- JE, 0 VE, 
distance 2VE, 
Equation: 
di2 
P= 0 


where diy = 2\/E,; = (|| 1— 2 ||)” is the Euclidean distance between the 
two constellation points ([link]). 


This is exactly the same bit-error probability as for the matched filter case. 


A similar bit-error analysis for matched filters can be found here. For the bit- 
error analysis for correlation receivers with an orthogonal signal set, refer 
here. 


Performance Analysis of Binary Orthogonal Signals with Correlation 


Orthogonal signals with equally likely bits, for ; ; , and 


Correlation (correlator-type) receiver 


(see [link]) 


Decide was transmitted if 

Equation: 

Equation: 

Alternatively, if is transmitted we decide on the wrong signal if or or when 
Equation: 

Note that the distance between and _ is . The average bit error probability —— aswe 


had for the antipodal case. Note also that the bit-error probability is the same as for the matched filter receiver. 


Performance Analysis of Binary Antipodal Signals with Matched Filters 


Matched Filter receiver 


Recall rz = Sm(t) + Ny where m = 1 or m = 2 and s;(t) = —s9(t) (see 
[link]). 


5> 


ar Detector 


Equation: 
Yi (T) —-H,+VYy 
Equation: 


Y2(T) = —E;s + V2 


since $;(t) = —s2(t) then 1 is N (0, E,). Furthermore v2 = —1. 
Given 1, then V2 is deterministic and equals —1;. Then Yo(T') = —Y;(T) 
if s(t) is transmitted. 


If s2(T’) is transmitted 
Equation: 


Y,(T) = —K, + Vy 


Equation: 


Y,(T) = Ei, T V2 


Vy is N (0, | and Y2 = —-l1. 


The receiver can be simplified to (see [link]) 


r, s ,(T - t) id 


If s1(t) is transmitted Y,(T) = EB, +14. 


If s2(t) is transmitted Y,(T) = —E, +1. 

Equation: 

P. = 1/2Pr[¥i(T) < 0| me 4 +1/2Pr[Y¥i(T) > 0 | s2(t)] 
—(ly+Es|)? 


—(y-Es\)* 
= 1/2f°,,——e ag dy+1/2 fy? te dy 


0 
2n=-Es an Es 


This is the exact bit-error rate of a correlation receiver. For a bit-error 
analysis for orthogonal signals using a matched filter receiver, refer here. 


Performance Analysis of Orthogonal Binary Signals with Matched Filters 
Equation: 


— 2a 
— '¥2(T)/ 
If s(t) is transmitted 
Equation: 
Yt) = si(r)A'(T —7r) dt +1 (T) 
— as s1(T)s1(7) dr+1(T) 
= E, + Vy (T) 
Equation: 
¥2(T) = “4, 81(7)82(r) dt + 12(T) 
= v,(T) 


If s(t) is transmitted, ¥1(T) = 11(T) and Yo(T) = E, + 12(T). 


Equation: 


HO 
E, Vy 
= + 
0 V2 
Equation: 
H1 
0 Vy 
= + 
E 5 V2 


where V; and 2 are independent are Gaussian with zero mean and variance 
N ee ; 

-; E,. The analysis is identical to the correlator example. 

Equation: 


Pe=Q 


No 


Note that the maximum likelihood detector decides based on comparing Y, 
and Y>. If Y; > Y> then s; was sent; otherwise s2 was transmitted. For a 
similar analysis for binary antipodal signals, refer here. See [link] or [link]. 


X, ? Sample function 


Digital Transmission over Baseband Channels 


Until this point, we have considered data transmissions over simple additive 
Gaussian channels that are not time or band limited. In this module we will 
consider channels that do have bandwidth constraints, and are limited to 
frequency range around zero (DC). The channel is best modified as g(t) is 
the impulse response of the baseband channel. 


Consider modulated signals x; = $,,(t) for0 < t < T for some 
m € {1,2,...,M}. The channel output is then 
Equation: 


| 


ee z,g(t-—T)dtT+N; 
i ae Sm(r)g(t —7)d7+ M, 


Tt 


The signal contribution in the frequency domain is 
Equation: 


The optimum matched filter should match to the filtered signal: 
Equation: 


Vf: (HE (F) = Smal fG(fe?"*) 


This filter is indeed optimum (i.e., it maximizes signal-to-noise ratio); 
however, it requires knowledge of the channel impulse response. The signal 
energy is changed to 

Equation: 


The band limited nature of the channel and the stream of time limited 
modulated signal create aliasing which is referred to as intersymbol 
interference. We will investigate ISI for a general PAM signaling. 


Pulse Amplitude Modulation Through Bandlimited Channel 
Consider a PAM system b_19,..., b-1, bo 61,... 


This implies 
Equation: 


Van; Gn, € {M levels of amplitude} : (: = se a,s(t — “)) 


The received signal is 
Equation: 
= eas yaaa Ans(t — (7 — nT))9(T) dz7+ NN: 
= Po An fo, a(t — (r — nT))g(7) dr +N; 
= eg Ond(t — nT) +N, 


n=— CO 


Since the signals span a one-dimensional space, one filter matched to 
§(t) = Sg(t) is sufficient. 


The matched filter's impulse response is 
Equation: 


Vt: (h°P*(t) = 39(T — t)) 


The matched filter output is 
Equation: 
yt) = fry Mind oo In d(t — (7 — nT) )R°P*(7) dr + v(t) 
= Vaw-oo On fo SE — (7 — nT) )hP(7) dr + v(t) 
= Yr. anu(t — nT) + v(t) 


nNn=—CO 


The decision on the &*" symbol is obtained by sampling the MF output at 
kT: 
Equation: 


y(kT) = S anu(kT — nT) + v(kT) 


n=—CO 


The k* symbol is of interest: 
Equation: 


y(kT) = azu(0) + S anu(kT — nT) + v(kT) 


n=—CO 


where n # k. 


Since the channel is bandlimited, it provides memory for the transmission 
system. The effect of old symbols (possibly even future signals) lingers and 
affects the performance of the receiver. The effect of ISI can be eliminated 
or controlled by proper design of modulation signals or precoding filters 
at the transmitter, or by equalizers or sequence detectors at the receiver. 


Precoding and Bandlimited Signals 


Precoding 


The data symbols are manipulated such that 
Equation: 


yx (kT) = azu(0) + ISI + v(kT) 


Design of Bandlimited Modulation Signals 


Recall that modulation signals are 


Equation: 
CO 
X= S- ans(t — nT’) 
n=—0O 
We can design s(t) such that 
Equation: 
large if n=0 


un) = | 


zero or small if n #0 


where y(kT) = axu(0) + 0, anu(kT — nT) + v(kT) (IST is the 
sum term, and once again, n # k .) Also, y(nT) = sgh°?*(nT) The signal 
s(t) can be designed to have reduced ISI. 


Design Equalizers at the Receiver 


Linear equalizers or decision feedback equalizers reduce ISI in the statistic 
Yt 


Maximum Likelihood Sequence Detection 


Equation: 


Oo 


y(kKT) = S~ an (kT — nT) + v(k(T)) 


n=—OCoO 


By observing y(T),y(2T),.. . the date symbols are observed frequently. 
Therefore, ISI can be viewed as diversity to increase performance. 


Carrier Phase Modulation 


Phase Shift Keying (PSK) 
Information is impressed on the phase of the carrier. As data changes from symbol period to symbol period, the 


phase shifts. 
Equation: 


Vm,m € {1,2,...,M}: (sul = APr(t) cos (2mfet = an) 


Example: 
Binary s1(t) or s2(t) 


Representing the Signals 


An orthonormal basis to represent the signals is 


Equation: 
1 
vi(t) = Zr APr(t) cos(27 fet) 
Equation: 
-1 
a(t) = VE APr(t) sin(27f-t) 
The signal 
Equation: 
Sin(t) = APr(t) cos (2" fot + aun) 
M 
Equation: 


Sin(t) = Acos (=e-") Pr(t) cos(2mf.t) — A sin( 1) ) Pr(t) sin(2rf,t) 


The signal energy 
Equation: 


(m—1) 


E, = © A? Pr’ (t) cos? 2af.t + oe dt 


= me ++ 400s Anfet + im) dt 


Equation: 


A’T 
2 


dt~ 


1 T 4 ee A’T 
ae an cos Aft + a) =a 
0 


(Note that in the above equation, the integral in the last step before the aproximation is very small.) Therefore, 
Equation: 


vi(t) = | Prt) cos(27f,t) 


Equation: 
2 
w(t) = 2 Pr(t) sin(2z ft) 
In general, 
Equation: 
Vm,m € {1,2,...,M}: (snl = APr(t) cos (2mfet 2 ain— iy) 
and 7; (t) 
Equation: 
my 
v(t) = | 2 Prt) cos(27 ft) 
Equation: 
tas 
wo(t) = i 2 Prt) sin(27f-t) 
Equation: 
ne 7 (m—1) 
VE; cos es 
Sm = 
TH ot a(m—1) 
VE, sin ia 


Demodulation and Detection 
Equation: 
rt = S(t) + Nz, for somem € {1,2,..., MZ} 
We must note that due to phase offset of the oscillator at the transmitter, phase jitter or phase changes occur 


because of propagation delay. 
Equation: 


r, = APr(t) cos 2mfet a !) ¢) + Nz 


For binary PSK, the modulation is antipodal, and the optimum receiver in AWGN has average bit-error 
probability 
Equation: 


The receiver where 
Equation: 


r_ = +(APr(t) cos(2rf.t + y)) + M 


The statistics 


Equation: 
Ty = e rracos 2nf.t+p dt 
=) AP i aAcos(2rf.t+y)cos Inft+yp dt + 5 cos Inft+yp Ndt 
Equation: 
A dy 
naa(% cos 4nf.t+yp+yp +cos p-—y at) +m 
0 
Equation: 


aA o aA aAT 
ry=Ht “9 Fos yp-yp |t+ + 9 008 An fet +e + dt+m+ Teale ae 2 + 14 
0 


a8 . : . : 2 
where 71 = @ 0 Nicos wet-+y dtis zero mean Gaussian with variance ~ ee : 
Therefore, 

Equation: 
Ps Q 22at cos y—y 
c a2NoT 


which is not a function of a and depends strongly on phase accuracy. 
Equation: 


2Es 


P.=Q cos p-—y M 


The above result implies that the amplitude of the local oscillator in the correlator structure does not play a role 
in the performance of the correlation receiver. However, the accuracy of the phase does indeed play a major 
role. This point can be seen in the following example: 


Example: 
Equation: 


Cy = Al cose Qnft! + 2nfeT 
Equation: 
a, = —1'Acos 2nf.t — Inf.7' —I2nf,7 + 4 


Local oscillator should match to phase 6. 


Differential Phase Shift Keying 


The phase lock loop provides estimates of the phase of the incoming 
modulated signal. A phase ambiguity of exactly 7 is a common occurance 
in many phase lock loop (PLL) implementations. 


Therefore it is possible that,@ @ 7m without the knowledge of the 
receiver. Even if there is no noise, if b then b and if b then 


b 


In the presence of noise, an incorrect decision due to noise may results in a 
correct final desicion (in binary case, when there is 7 phase ambiguity with 


the probability: 
Equation: 
E 
RQ 
Consider a stream of bits a, and BPSK modulated signal 
Equation: 
VAP EE “nt Tht 0 


In differential PSK, the transmitted bits are first encoded 6b, a, dy, 
with initial symbol (e.g. b ) chosen without loss of generality to be either 0 
or 1. 


Transmitted DPSK signals 
Equation: 


>» APrt nT mht 0 


The decoder can be constructed as 


Equation: 
an 
an 

If two consecutive bits are detected correctly, if b, 6b, and by bn 
then 
Equation: 

bn Bn 

an n by, 

an 
Wb. <b, and 6,, Ge. . That is, two consecutive bits are 
detected incorrectly. Then, 
Equation: 

An b, bn 

b, bn 

by, n 

by, n 

an 
LE by: 20% and b, b,, , that is, one of two consecutive bits is 


detected in error. In this case there will be an error and the probability of 
that error for DPSK is 
Equation: 


This approximation holds if Q is small. 


Carrier Frequency Modulation 


Frequency Shift Keying (FSK) 


The data is impressed upon the carrier frequency. Therefore, the M/ different signals are 
Equation: 


Sm(t) = APr(t) cos(2m fot + 2x (m —1)A(f)t + Om) 


form € {1,2,..., M} 


The M different signals have M different carrier frequencies with possibly different phase angles since the 
generators of these carrier signals may be different. The carriers are 


Equation: 
fi =fe 
fo i fe + A(f) 

fu = fe = MA(f) 
Thus, the M signals may be designed to be orthogonal to each other. 
Equation: 
Cn ae a A? cos(2afet + 2m (m —1)A(f)t + Om) cos(Qr fot + 2 (n—1)A(f)t+On) dt 

= © Pcos(4rfet +24 (n+m—2)A(f)t+ Om +n) dt+* J cos(2m(m—n)A(f)t + Om 


A® sin(4af,.T+20(n+m—2)A(f)T+Om+On)—sin(Om+On) , A? — sin(2a(m—n)A(f)T+4m—In) sin(Om—49n) 
2 dn f,+2n(n+m—2)A(f) ese 2n(m—n)A(f) 2n(m—n)A(f) 


If 2f-T + (n +m — 2)A(f)T is an integer, and if (m — n)A(f)T is also an integer, then (S,,, S,) = 0 if 
A(f)T is an integer, then ($m, 8n) ~ 0 when f, is much larger than +. 


In case Vm, Om = 0: (8m = 0) 


Equation: 
2 
(Siiqs-Sn) & sinc (2(m — n)A(f)T) 
Therefore, the frequency spacing could be as small as A( f) = Ta since sinc («) = 0 if = +(1) or +(2). 


If the signals are designed to be orthogonal then the average probability of error for binary FSK with optimum 
receiver is 
Equation: 


in AWGN. 


Note that sinc (x) takes its minimum value not at = -+(1) but at +(1.4) and the minimum value is —0.216. 
Therefore if A(f) = an then 


Equation: 


which is a gain of 10 x log 1.216 ~ 0.85d0 over orthogonal FSK. 


Information Theory and Coding 


In the previous chapters, we considered the problem of digital transmission 
over different channels. Information sources are not often digital, and in 
fact, many sources are analog. Although many channels are also analog, it 
is still more efficient to convert analog sources into digital data and transmit 
over analog channels using digital transmission techniques. There are two 
reasons why digital transmission could be more efficient and more reliable 
than analog transmission: 


1. Analog sources could be compressed to digital form efficiently. 
2. Digital data can be transmitted over noisy channels reliably. 


There are several key questions that need to be addressed: 


1. How can one model information? 

2. How can one quantify information? 

3. If information can be measured, does its information quantity relate to 
how much it can be compressed? 

4. Is it possible to determine if a particular channel can handle 
transmission of a source with a particular information quantity? 


[sao PAS 
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Example: 

The information content of the following sentences: "Hello, hello, hello." 
and "There is an exam today." are not the same. Clearly the second one 
carries more information. The first one can be compressed to "Hello" 
without much loss of information. 


In other modules, we will quantify information and find efficient 
representation of information (Entropy). We will also quantify how much 
information can be transmitted through channels, reliably. Channel coding 
can be used to reduce information rate and increase reliability. 


Entropy 


Information sources take very different forms. Since the information is not known 
to the destination, it is then best modeled as a random process, discrete-time or 
continuous time. 


Here are a few examples: 


Digital data source (e.g., a text) can be modeled as a discrete-time and discrete 
valued random process X1, Xo, ..., where X; € {A, B,C, D, E,...} witha 
particular px, (x), px,(x), ..., anda specific px,x,, Dx,X,) ---, and px, x,Xx;, 
ieee Co Pree ee 

Video signals can be modeled as a continuous time random process. The 
power spectral density is bandlimited to around 5 MHz (the value depends on 
the standards used to raster the frames of image). 

Audio signals can be modeled as a continuous-time random process. It has 
been demonstrated that the power spectral density of speech signals is 
bandlimited between 300 Hz and 3400 Hz. For example, the speech signal can 
be modeled as a Gaussian process with the shown power spectral density over 
a small observation period. 


SH) 


300 3400 


These analog information signals are bandlimited. Therefore, if sampled faster than 
the Nyquist rate, they can be reconstructed from their sample values. 


Example: 
A speech signal with bandwidth of 3100 Hz can be sampled at the rate of 6.2 kHz. 
If the samples are quantized with a 8 level quantizer then the speech signal can be 


represented with a binary sequence with the rate of 
Equation: 


6.2 x 10? log, 8 — 18600 bits samples 


sample sec 


= oe 


sec 


Speech signal “A 0011011010111100 


ih 


1 7 seconds 
6.2x 10 


The sampled real values can be quantized to create a discrete-time discrete-valued 
random process. Since any bandlimited analog information signal can be 
converted to a sequence of discrete random variables, we will continue the 
discussion only for discrete random variables. 


Example: 

The random variable z takes the value of 0 with probability 0.9 and the value of 1 
with probability 0.1. The statement that x = 1 carries more information than the 
statement that a = 0. The reason is that x is expected to be 0, therefore, knowing 
that x = 1 is more surprising news!! An intuitive definition of information 
measure should be larger when the probability is small. 


Example: 

The information content in the statement about the temperature and pollution level 
on July 15th in Chicago should be the sum of the information that July 15th in 
Chicago was hot and highly polluted since pollution and temperature could be 
independent. 

Equation: 


I(hot, high) = I(hot) + I(high) 


An intuitive and meaningful measure of information should have the following 
properties: 


1. Self information should decrease with increasing probability. 
2. Self information of two independent events should be their sum. 
3. Self information should be a continuous function of the probability. 


The only function satisfying the above conditions is the -log of the probability. 


Entropy 
The entropy (average self information) of a discrete random variable X is a 
function of its probability mass function and is defined as 
Equation: 


N 
H(X) =— > px (a,)log px (zi) 


where NV is the number of possible values of X and px (x;) = Pr[X = z;]. 
If log is base 2 then the unit of entropy is bits. Entropy is a measure of 
uncertainty in a random variable and a measure of information it can reveal. 
A more basic explanation of entropy is provided in another module. 


Example: 

If a source produces binary information {0, 1} with probabilities p and 1 — p. The 
entropy of the source is 

Equation: 


H(X) = (— (plog, p)) — (1 — p) logy (1 — p) 


If p = 0 then A(X) =0, if p = 1 then A(X) =0, ifp =1/2 then A(X) =1 
bits. The source has its largest entropy if p = 1/2 and the source provides no new 
information if p = 0 or p = 1. 
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Example: 
An analog source is modeled as a continuous-time random process with power 
spectral density bandlimited to the band between 0 and 4000 Hz. The signal is 
sampled at the Nyquist rate. The sequence of random variables, as a result of 
sampling, are assumed to be independent. The samples are quantized to 5 levels 
ie 2 ze 0, : Mae The probability of the samples taking the quantized values are 
+; +, ¥ + oT ae respectively. The entropy of the random variables are 
Equation: 


1 
H(X) = (— (3 1082 2)) — a loge ¢ — § loss = — 6 Joes ae — a6 108s 6 
= 5 log, 2+ F log, 4+ 4 log, 8+ = 76 logs 164+4 <q log, 16 
Sm pats 
15 _ bits 


8 sample 


There are 8000 samples per second. Therefore, the source produces 
8000 x @ = 150002 of information. 


Joint Entropy 
The joint entropy of two discrete random variables (X, Y) is defined by 
Equation: 


H(X,Y)=-S°S° pxy (ai, ys)log pxy (ai, 95) 
a 9 


The joint entropy for a random vector X = (X,X2...X ae is defined as 
Equation: 


H(X) _ -Soyo. ye px (21, 22,...,%n)log px (Liy Coys, a) 


L121 LyoLo 


Conditional Entropy 
The conditional entropy of the random variable X given the random variable 
Y is defined by 
Equation: 


H(X|Y)=-S °° pxy (#i,y,) log pxyy (wilys) 
Fj 


It is easy to show that 
Equation: 

and 

Equation: 


H(X,Y) = H(Y)+H(X|Y) 


If X1, Xo, ..., X, are mutually independent it is easy to show that 
Equation: 


H(X)= Ss H(X;) 


Entropy Rate 
The entropy rate of a stationary discrete-time random process is defined by 
Equation: 


H =limit H(X)|X1X2...Xn) 
n—-Cco 


The limit exists and is equal to 
Equation: 


ee 
H =limit —H(X, Xo,..., Xn) 
n->oco Nn 
The entropy rate is a measure of the uncertainty of information content per 
output symbol of the source. 


Entropy is closely tied to source coding. The extent to which a source can be 
compressed is related to its entropy. In 1948, Claude E. Shannon introduced a 
theorem which related the entropy to the number of bits per second required to 
represent a source without much loss. 


Source Coding 


As mentioned earlier, how much a source can be compressed should be 
related to its entropy. In 1948, Claude E. Shannon introduced three 
theorems and developed very rigorous mathematics for digital 
communications. In one of the three theorems, Shannon relates entropy to 
the minimum number of bits per second required to represent a source 
without much loss (or distortion). 


Consider a source that is modeled by a discrete-time and discrete-valued 
random process Xj, X9, ..., Xn, ... where x; € {@1,@2,...,an} and 
define px,(x; = a,j) = p; forj7 = 12... N, where it is assumed that X1, 
X9,... Xy are mutually independent and identically distributed. 


Consider a sequence of length n 
Equation: 


The symbol a; can occur with probability p;. Therefore, in a sequence of 
length n, on the average, a; will appear np, times with high probabilities if 
n is very large. 


Therefore, 
Equation: 
PC = )=px,(«1)px,(£2)..-px,(£n) 
Equation: 
N 
PO =] 87" pup = pi? 


where p; = P(X; = a;) for all j and for all 7. 


A typical sequence _may look like 
Equation: 


a2 


a) 
an 
a2 


a1 


an 
a¢ 


where a; appears np; times with large probability. This is referred to as a 
typical sequence. The probability of | being a typical sequence is 
Equation: 


tan, “irdats, WN eee oe IN eigen Phe 
P( =e = 4 1 Pi P — el 2082 P 
= N  9npi logs pi 
=e : 


tae n \ p,logsp,; 
2 4142 22% 


9~(nHl(X)) 


where H(X) is the entropy of the random variables X,, X9,..., Xp. 


For large n, almost all the output sequences of length n of the source are 
equally probably with probability ~ 2~("7(*)), These are typical 
sequences. The probability of nontypical sequences are negligible. There 
are NV” different sequences of length n with alphabet of size N. The 
probability of typical sequences is almost 1. 

Equation: 


# of typical seq. 


Detector m 
Example: 
Consider a source with alphabet {A,B,C,D} with probabilities { +, 4, = 


+h. Assume X1, X9,..., Xg is an independent and identically distributed 


sequence with X; € {A, B,C, D} with the above probabilities. 
Equation: 


The number of typical sequences of length 8 
Equation: 


14 


The number of nontypical sequences 

A ee A ee: 

Examples of typical sequences include those with A appearing 8 x 5 = 4 
times, B appearing 8 x + = 2 times, etc. {A,D,B,B,A,A,C,A}, 
{A,A,A,A,C,D,B,B} and much more. 

Examples of nontypical sequences of length 8: {D,D,B,C,C,A,B,D}, 
{C,C,C,C,C,B,C,C} and much more. Indeed, these definitions and 
arguments are valid when n is very large. The probability of a source 
output to be in the set of typical sequences is 1 when n — oo. The 
probability of a source output to be in the set of nontypical sequences 
approaches 0 as n — oo. 


The essence of source coding or data compression is that as n — oo, 
nontypical sequences never appear as the output of the source. Therefore, 
one only needs to be able to represent typical sequences as binary codes and 
ignore nontypical sequences. Since there are only 2” -) typical sequences 
of length n, it takes nH(X) bits to represent them on the average. On the 
average it takes H(X) bits per source output to represent a simple source 
that produces independent and identically distributed outputs. 

Theorem 

Shannon's Source-Coding 


A source that produced independent and identically distributed random 
variables with entropy H can be encoded with arbitrarily small error 


probability at any rate R in bits per source output if R > H. Conversely, if 
R < H, the error probability will be bounded away from zero, independent 
of the complexity of coder and decoder. 


The source coding theorem proves existence of source coding techniques 
that achieve rates close to the entropy but does not provide any algorithms 
or ways to construct such codes. 


If the source is not i.i.d. (independent and identically distributed), but it is 
stationary with memory, then a similar theorem applies with the entropy 
H(X) replaced with the entropy rate H = limit H(Xn|X1X2...Xn-1) 


In the case of a source with memory, the more the source produces outputs 
the more one knows about the source and the more one can compress. 


Example: 

The English language has 26 letters, with space it becomes an alphabet of 
size 27. If modeled as a memoryless source (no dependency between 
letters in a word) then the entropy is H(X) = 4.03 bits/letter. 

If the dependency between letters in a text is captured in a model the 
entropy rate can be derived to be H = 1.3 bits/letter. Note that a non- 
information theoretic representation of a text may require 5 bits/letter since 
2° is the closest power of 2 to 27. Shannon's results indicate that there may 
be a compression algorithm with the rate of 1.3 bits/letter. 


Although Shannon's results are not constructive, there are a number of 
source coding algorithms for discrete time discrete valued sources that 
come close to Shannon's bound. One such algorithm is the Huffman source 
coding algorithm. Another is the Lempel and Ziv algorithm. 


Huffman codes and Lempel and Ziv apply to compression problems where 
the source produces discrete time and discrete valued outputs. For cases 
where the source is analog there are powerful compression algorithms that 
specify all the steps from sampling, quantizations, and binary 


representation. These are referred to as waveform coders. JPEG, MPEG, 
vocoders are a few examples for image, video, and voice, respectively. 


Huffman Coding 


One particular source coding algorithm is the Huffman encoding algorithm. 
It is a source coding algorithm which approaches, and sometimes achieves, 
Shannon's bound for source compression. A brief discussion of the 
algorithm is also given in another module. 


Huffman encoding algorithm 


1. Sort source outputs in decreasing order of their probabilities 

2. Merge the two least-probable outputs into a single output whose 
probability is the sum of the corresponding probabilities. 

3. If the number of remaining outputs is more than 2, then go to step 1. 

. Arbitrarily assign 0 and 1 as codewords for the two remaining outputs. 

5. If an output is the result of the merger of two outputs in a preceding 
step, append the current codeword with a 0 and a 1 to obtain the 
codeword the the preceding outputs and repeat step 5. If no output is 
preceded by another output in a preceding step, then stop. 


& 


Example: 
X € {A, B, C, D} with probabilities { $,4,4,4} 
Codeword 
A} | 0 
B 5 01 


Average length = $1 + 52 + =o + ae = = As you may recall, the 
entropy of the source was also H(X) = =. In this case, the Huffman 

bit 
RaeBIE é 


code achieves the lower bound of — 


In general, we can define average code length as 
Equation: 


€=S- px («)é(2) 


LEX 


where X is the set of possible values of z. 
It is not very hard to show that 
Equation: 


H(X) >> H(X)+1 
For compressing single source output at a time, Huffman codes provide 
nearly optimum code lengths. 
The drawbacks of Huffman coding 


1. Codes are variable length. 
2. The algorithm requires the knowledge of the probabilities, px (a) for 


allzxe X. 


Another powerful source coder that does not have the above shortcomings 
is Lempel and Ziv. 


Channel Capacity 


In the previous section, we discussed information sources and quantified 
information. We also discussed how to represent (and compress) 
information sources in binary symbols in an efficient manner. In this 
section, we consider channels and will find out how much information can 
be sent through the channel reliably. 


We will first consider simple channels where the input is a discrete random 
variable and the output is also a discrete random variable. These discrete 
channels could represent analog channels with modulation and 
Scr and detection. 


Discrete Channel 


Let us denote the input sequence to the channel as 
Equation: 


where a discrete symbol set or input alphabet. 


The channel output 
Equation: 


where a discrete symbol set or output alphabet. 


The statistical properties of a channel are determined if one finds 


yx yz forally and for all x . A discrete channel is called 
a discrete memoryless channel if 
Equation: 


yx yw 
for all y and for all x 


Example: 
A binary symmetric channel (BSC) is a discrete memoryless channel with 
binary input and binary output and 


As an example, a white Gaussian channel with antipodal signaling and 


matched filter receiver has probability of error of —  . Since the 


error is symmetric with respect to the transmitted bit, then 
Equation: 


S(O) 


300 3400 


It is interesting to note that every time a BSC is used one bit is sent across 
the channel with probability of error of . The question is how much 
information or how many bits can be sent per channel use, reliably. Before 
we consider the above question a few definitions are essential. These are 
discussed in mutual information. 


Mutual Information 


Recall that 
Equation: 


H(X,Y)=—S_Y° pxy (a, y)log pxy (2,9) 
zt yy 


Equation: 


H(Y) + H(X|Y) = H(X) + H(Y|X) 


Mutual Information 
The mutual information between two discrete random variables is 
denoted by .4(X; Y) and defined as 
Equation: 


I(X;Y) = H(X) — H(XIY) 


Mutual information is a useful concept to measure the amount of 
information shared between input and output of noisy channels. 


In our previous discussions it became clear that when the channel is noisy 
there may not be reliable communications. Therefore, the limiting factor 
could very well be reliability when one considers noisy channels. Claude E. 
Shannon in 1948 changed this paradigm and stated a theorem that presents 
the rate (speed of communication) as the limiting factor as opposed to 
reliability. 


Example: 
Consider a discrete memoryless channel with four possible inputs and 
outputs. 


a a 
b b 
c C 
d d 


Every time the channel is used, one of the four symbols will be 
transmitted. Therefore, 2 bits are sent per channel use. The system, 
however, is very unreliable. For example, if "a" is received, the receiver 
can not determine, reliably, if "a" was transmitted or "d". However, if the 
transmitter and receiver agree to only use symbols "a" and "c" and never 
use "b" and "d", then the transmission will always be reliable, but 1 bit is 
sent per channel use. Therefore, the rate of transmission was the limiting 
factor and not reliability. 


This is the essence of Shannon's noisy channel coding theorem, i.e., using 
only those inputs whose corresponding outputs are disjoint (e.g., far apart). 
The concept is appealing, but does not seem possible with binary channels 
since the input is either zero or one. It may work if one considers a vector of 
binary inputs referred to as the extension channel. 


X41 
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Xinput vector= | €X = ({0,1}" 


Xn 


Yi 


Y output vector= | €Y = {0,1}” 


x" Y 


This module provides a description of the basic information necessary to 
understand Shannon's Noisy Channel Coding Theorem. However, for 
additional information on typical sequences, please refer to Typical 
Sequences. 


Typical Sequences 


If the binary symmetric channel has crossover probability ¢ then if x is transmitted then by the Law of 
Large Numbers the output y is different from a in ne places if n is very large. 
Equation: 


dy(x,y) ~ ne 


The number of sequences of length n that are different from a of length n at ne is 
Equation: 


n n! 
ne (ne)! (n — ne)! 


Example: 
2 = (000)* and € = + and ne = 3 Xx $ The number of output sequences different from a by one 


element: aa = 22%) = 3 given by (101)", (011)", and (000)". 


Using Stirling's approximation 
Equation: 


nl ~ ne "V/2nn 


we Can approximate 
Equation: 


n 


me gu((~(€logze))—(1=2) logs(1—2)) — gn tile) 
NE 


where H,(e) = (— (e logy e)) — (1 — €) log, (1 — €) is the entropy of a binary memoryless source. For 
any there are 2”%(©) highly probable outputs that correspond to this input. 


Consider the output vector Y as a very long random vector with entropy nH(Y). As discussed earlier, the 
number of typical sequences (or highly probably) is roughly 2”), Therefore, 2” is the total number of 
binary sequences, 2””(¥) is the number of typical sequences, and 2”»(©) is the number of elements in a 
group of possible outputs for one input vector. The maximum number of input sequences that produce 
nonoverlapping output sequences 

Equation: 


_ gnH(Y) 
M —_ Qn, AG) 


gn(H(Y)—Hi(e)) 


typical sequence 
as the result 
of input 


nontypical 
X, sequence 


The number of distinguishable input sequences of length n is 
Equation: 


gr(H(Y)—Hile)) 


The number of information bits that can be sent across the channel reliably per n channel uses 
n (H(Y) — H,(e)) The maximum reliable transmission rate per channel use 
Equation: 


Re log, M 


n(H(Y)—H(e)) 


= H(Y)-—4H,(e) 


The maximum rate can be increased by increasing H(Y). Note that Hy(¢) is only a function of the 
crossover probability and can not be minimized any further. 


The entropy of the channel output is the entropy of a binary random variable. If the input is chosen to be 


uniformly distributed with px(0) = px(1) = $. 


Then 
Equation: 
py(0) = 1px(0) =e epx(1) 
1 
= 
and 
Equation: 
py(1) = 1px(1) + epx(0) 
i 
= 5 
Then, H(Y) takes its maximum value of 1. Resulting ina maximum rate R= — H,(e) when 


px(0) = px(1) = 4. This result says that ordinarily one bit is transmitted across a BSC with reliability 


1 — e. If one needs to have probability of error to reach zero then one should reduce transmission of 
information to 1 — Hj(e) and add redundancy. 


Recall that for Binary Symmetric Channels (BSC) 
Equation: 


H(Y|X) = p,(0)H(Y|X = 0) +p.(1)H(Y|X = 1) 
px(0) (— (1 — €) log, (1 — €) — elogge)) + pe(1) (— ((1 — €) log, (1 — €) — € logy e)) 


(— ((1 — €) log (1 — €))) — € logge 
H,(e) 


Therefore, the maximum rate indeed was 
Equation: 


R = H(Y)—H(Y|X) 


(X;Y) 


Example: 

The maximum reliable rate for a BSC is 1 — Hy(e). The rate is 1 when ¢ = 0 or e = 1. The rate is 0 
= il 

when € = = 
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This module provides background information necessary for an understanding of Shannon's Noisy 
Channel Coding Theorem. It is also closely related to material presented in Mutual Information. 


Shannon's Noisy Channel Coding Theorem 


It is highly recommended that the information presented in Mutual 
Information and in Typical Sequences be reviewed before proceeding with 
this document. An introductory module on the theorem is available at Noisy 
Channel Theorems . 

Theorem 

Shannon's Noisy Channel Coding 


The capacity of a discrete-memoryless channel is given by 
Equation: 


C =MAaXp y (zx) eAe.< Y)| px(ax)} 


where .4(X; Y) is the mutual information between the channel input X 
and the output Y. If the transmission rate R is less than C’, then for any 

€ > O there exists a code with block length n large enough whose error 
probability is less than e. If R > C, the error probability of any code with 
any block length is bounded away from zero. 


Example: 

If we have a binary symmetric channel with cross over probability 0.1, 
then the capacity C’ ~ 0.5 bits per transmission. Therefore, it is possible to 
send 0.4 bits per channel through the channel reliably. This means that we 
can take 400 information bits and map them into a code of length 1000 
bits. Then the whole code can be transmitted over the channels. One 
hundred of those bits may be detected incorrectly but the 400 information 
bits may be decoded correctly. 


Before we consider continuous-time additive white Gaussian channels, let's 
concentrate on discrete-time Gaussian channels 
Equation: 


Y,=Xit+m 


where the X;'s are information bearing random variables and 7; is a 
Gaussian random variable with variance or. The input X;,'s are constrained 
to have power less than P 
Equation: 
1 n 
— X/<P 
nr. 
a1 


Consider an output block of size n 
Equation: 


= + 


For large n, by the Law of Large Numbers, 
Equation: 


aes 1 

= = (\y; — @;|)? <o, 
n . n . 

a1 a1 


n 


2 


This indicates that with large probability as n approaches infinity, will be 


located in an n-dimensional sphere of radius a,” centered about 
; 2 2 
since (| — |)" <no, 
On the other hand since X;'s are power constrained and 7; and X;'s are 


independent 
Equation: 


Equation: 


| | Sn Pto,’ 


Thismean  isinasphere of radius (P+ 0,) centered around the 
origin. 


How many __'s can we transmit to have nonoverlapping spheres in the 
output domain? The question is how many spheres of radius no,” fit ina 
sphere of radius n(P+ 0,7”). 

Equation: 


Exercise: 


Problem: 
How many bits of information can one send in n uses of the channel? 


Solution: 
Equation: 


P 2 
1063. Lee 
On 


The capacity of a discrete-time Gaussian channel C' = -- log, 1+ =, 
ui 


bits per channel use. 


When the channel is a continuous-time, bandlimited, additive white 

; se: NG 
Gaussian with noise power spectral density —°* and input power constraint 
P and bandwidth W. The system can be sampled at the Nyquist rate to 


provide power per sample P and noise power 
Equation: 
a W No 
oo = woidf 


WNo 


The channel capacity ~ log, 1+ wa bits per transmission. Since the 


sampling rate is 2W, then 
Equation: 


2W P 
C= “5. log, 1+ Now bits/trans. x trans. /sec 


Equation: 


P bits 
NoW — sec 


C=Whlog, 1+ 


Example: 
The capacity of the voice band of a telephone channel can be determined 
using the Gaussian model. The bandwidth is 3000 Hz and the signal to 


noise ratio is often 30 dB. Therefore, 
Equation: 
bits 


C = 3000 log, (1 + 1000) ~ 30000 — 
sec 


One should not expect to design modems faster than 30 Kbs using this 
model of telephone channels. It is also interesting to note that since the 
signal to noise ratio is large, we are expecting to transmit 10 
bits/second/Hertz across telephone channels. 


Channel Coding 


Channel coding is a viable method to reduce information rate through the 
channel and increase reliability. This goal is achieved by adding redundancy 
to the information symbol vector resulting in a longer coded vector of 
symbols that are distinguishable at the output of the channel. Another brief 
explanation of channel coding is offered in Channel Coding and the 
Repetition Code. We consider only two classes of codes, block codes and 
convolutional codes. 


Block codes 


The information sequence is divided into blocks of length k. Each block is 
mapped into channel inputs of length n. The mapping is independent from 
previous blocks, that is, there is no memory from one block to another. 


Example: 
he — 2 and — a 
Equation: 

00 — 00000 
Equation: 

01 — 10100 
Equation: 

10 > 01111 
Equation: 

11 — 11011 


information sequence > codeword (channel input) 


A binary block code is completely defined by 2" binary sequences of length 
n called codewords. 
Equation: 


= {c1, CQ5-- -5 Cox } 
Equation: 


Ce {0, 1}” 


There are three key questions, 


1. How can one find "good" codewords? 

2. How can one systematically map information sequences into 
codewords? 

3. How can one systematically find the corresponding information 
sequences from a codeword, i.e., how can we decode? 


These can be done if we concentrate on linear codes and utilize finite field 
algebra. 


A block code is linearif ; © and ; © implies ;@ ;©€ where® 
is an elementwise modulo 2 addition. 


Hamming distance is a useful measure of codeword properties 
Equation: 


du( i, ;) =# of places that they are different 


oo oO 


Denote the codeword for information sequence e; = by g; and 


0 
0 0 
1 0 
0 0 
eg= 9 by ga,..., and eg = 0 by gx. Then any information 
0 0 
0 
sequence can be expressed as 
Equation: 
U1 
Uk 
k 
= Sg ae, 


and the corresponding codeword could be 
Equation: 


Therefore 


Equation: 


= G 
91 
n k 92 
with = {0,1}" and €{0,1}"whereG= | ,akxn matrix and 
Gk 
all operations are modulo 2. 
Example: 
In [link] with 
Equation: 
00 — 00000 
Equation: 
01 — 10100 
Equation: 
10 > 01111 
Equation: 
11 > 11011 


gi = (01111)" and gy = (10100)* and G = 


Additional information about coding efficiency and error are provided in 
Block Channel Coding. 


Examples of good linear codes include Hamming codes, BCH codes, Reed- 
Solomon codes, and many more. The rate of these codes is defined as & 
and these codes have different error correction and error detection 
properties. 


Convolutional Codes 


Convolutional codes are one type of code used for channel coding. Another 
type of code used is block coding. 


Convolutional codes 


In convolutional codes, each block of bits is mapped into a block of _ bits 
but these _ bits are not only determined by the present information bits 
but also by the previous information bits. This dependence can be captured 
by a finite state machine. 


Example: 
A rate — convolutional coder , with memory length 2 and 
constraint length 3. 


Y, (1) 


53> 


Detector 


Since the length of the shift register is 2, there are 4 different rates. The 
behavior of the convolutional coder can be captured by a 4 state machine. 
States, OO, O1, 10, 115 

For example, arrival of information bit © transitions from state 10 to state 
O01} 

The encoding and the decoding process can be realized in trellis structure. 


If the input sequence is 
1100 

the output sequence would be 
11 10 10 11 


The transmitted codeword is then 11 10 10 11. If there is one error on 
the channel 11 00 10 11 


00 


01 


1 


Starting from state 00 the Hamming distance between the possible paths 
and the received sequence is measured. At the end, the path with minimum 
distance to the received sequence is chosen as the correct trellis path. The 
information sequence will then be determined. 


Convolutional coding lends itself to very efficient trellis based encoding 
and decoding. They are very practical and powerful codes. 


Homework 1 of Elec 430 
Elec 430 homework set 1. Rice University Department of Electrical and Computer Engineering. 
Exercise: 

Problem: 

The current I in a semiconductor diode is related to the voltage V by the relation J = eY — 1. If V is 


a random variable with density function fy(z) = +e7'*! for —oo < x < oo, find fy (y); the density 
function of I. 


Exercise: 


Problem: 
Show that if AB = {} then Pr[.A] < Pr/[B* 


Show that for any A, B, C' we have 
Pr[AU BUC] = Pr[A] + Pr[B] + Pr[C] — Pr[An B] — Pr[ANC] — Pr[BNC] 4+ Pr[lAN BNC] 


Show that if A and B are independent the Pr[A M B‘] = Pr{[A] Pr[.B‘] which means A and BS are 
also independent. 
Exercise: 


Problem: 
Suppose X is a discrete random variable taking values {0, 1, 2,...,} with the following probability 
_ 2 o*(1—6)"* if k= {0,1,2,...,n} 


mass function px (k) = ¢ *("-*)! with parameter 6 € [0, 1] 
0 otherwise 


Find the characteristic function of X. 


Find xX and o% 


Note:See problems 3.14 and 3.15 in Proakis and Salehi 


Exercise: 
Problem: 
Consider outcomes of a fair dice 2 = {w1, w2, W3, W4, W5, w6}. Define events 


A = {w,w|an even number appears} and B = {w,w|a number less than 5 appears}. Are these 
events disjoint? Are they independent? (Show your work!) 


Exercise: 
Problem: This is problem 3.5 in Proakis and Salehi. 


An information source produces 0 and 1 with probabilities 0.3 and 0.7, respectively. The output of the 
source is transmitted via a channel that has a probability of error (turning a 1 into a 0 or a0 into a 1) 


equal to 0.2. 
What is the probability that at the output a 1 is observed? 
What is the probability that a 1 was the output of the source if at the output of the channel a 1 is 
observed? 
Exercise: 
Problem: 
Suppose X and Y are each Gaussian random variables with means zx and jy and variances os and 


Ou: Assume that they are also independent. Show that Z = X + Y is also Gaussian. Find the mean 
and variance of Z. 


Homework 2 of Elec 430 


Elec 430 homework set 2. Rice University Department of Electrical and 
Computer Engineering. 


Problem 1 


Suppose A and B are two Gaussian random variables each zero mean with 


A? < oo and B? < on. The correlation between them is denoted by AB. 
Define the random process X; = A+ Bt and Y; = B+ At. 


e a) Find the mean, autocorrelation, and crosscorrelation functions of X; 
and Y;. 

e b) Find the 1st order density of Xz, fx,(x) 

¢ c) Find the conditional density of X;, given X+,, f Xp x, (€2|@1)- 


Assume ft2 > ty 


Note:see Proakis and Salehi problem 3.28 


d) Is X; wide sense stationary? 


Problem 2 


Show that if X; is second-order stationary, then it is also first-order 
stationary. 


Problem 3 


Let a stochastic process X; be defined by X; = cos(t + O) where 2 and 
O are statistically independent random variables. O is uniformaly 
distributed over |—7r, 7] and 2 has an unknown density fe(w). 


a) Compute the expected value of X;. 

b) Find an expression for the correlation function of X;. 
c) Is X; wide sense stationary? Show your reasoning. 
d) Find the first-order density function fx,(z). 


Homework 5 of Elec 430 


Problem 1 


Consider a ternary communication system where the source produces three 
possible symbols: 0, 1, 2. 


a) Assign three modulation signals s(t), s(t), and s3(t) defined on 

t € [0, T] to these symbols, 0, 1, and 2, respectively. Make sure that these 
signals are not orthogonal and assume that the symbols have an equal 
probability of being generated. 


b) Consider an orthonormal basis w(t), wW2(t), ..., w(t) to represent these 
three signals. Obviously N could be either 1, 2, or 3. 


¥ (0 


3> 


ML 
Detector 


Now consider two different receivers to decide which one of the symbols 
were transmitted when r; = s,,(t) + N; is received where m = {1, 2,3} 


and JV; is a zero mean white Gaussian process with Sy(f) = *s for all f. 
What is fpjs,(¢) and what is fy). (+)? 


3> 


ML 
Detector 


Pb) 


Find the probability that m 4 m for both receivers. P, = Pr lm =e m| ; 


Problem 2 


Proakis and Salehi problems 7.18, 7.26, and 7.32 


Problem 3 


Suppose our modulation signals are s;(t) and s2(t) where s;(t) = e~* for 
all ¢ and s9(t) = —s,(t). The channel noise is AWGN with zero mean and 


spectral height ay The signals are transmitted equally likely. 


u,,(T) e N 


un(t) +1; 
r, = Spft) +N; 


Threshold 
device 


Find the impulse response of the optimum filter. Find the signal component 
of the output of the matched filter at t = T’ where s;(t) is transmitted; i.e., 
u,(t). Find the probability of error Pr|m 4 m]. 


In this part, assume that the power spectral density of the noise is not flat 
and in fact is 
Equation: 
1 
MS 
(rf)? + a? 


for all f, where a is real and positive. Can you show that the optimum filter 
in this case is a cascade of two filters, one to whiten the noise and one to 
match to the signal at the output of the whitening filter? 


u,{t) + 1, 
tr, =S,(t) +N; 


Threshold 
device 


c) Find an expression for the probability of error. 


Homework 3 of Elec 430 
Exercise: 


Problem: 


Suppose that a white Gaussian noise X; is input to a linear system 
with transfer function given by 


Equation: 
=p fal lf 2 
Cee ‘4 if |f| >2 


Suppose further that the input process is zero mean and has spectral 
height Ae = 5. Let Y; denote the resulting output process. 


1. Find the power spectral density of Y;. Find the autocorrelation of 
Y, (i.e., Ry(r)). 

2. Form a discrete-time process (that is a sequence of random 
variables) by sampling Y; at time instants 7’ seconds apart. Find a 
value for T’ such that these samples are uncorrelated. Are these 
samples also independent? 

3. What is the variance of each sample of the output process? 


X, h Y; Ly 
Every T 


seconds 


Z,= Y,,fork=...-1, 0, 1, 2, ... 


Exercise: 


Problem: 


Suppose that X; is a zero mean white Gaussian process with spectral 
height Ao = 5. Denote Y; as the output of an integrator when the 
input is Y;. 


X, h Y; 2, 
Every T 


seconds 


Z,= Y,,fork=...-1,0, 1, 2, ... 


1. Find the mean function of Y;. Find the autocorrelation function of 
Y, ) Ry (t ae ) t) 

2. Let Z; be a sequence of random variables that have been obtained 
by sampling Y; at every 7’ seconds and dumping the samples, that 
is 
Equation: 


kT 
Lk = / Se dr 
(k—-1)T 


Find the autocorrelation of the discrete-time processes Z;'s, that 


is, Rz(k + m,k) = E(Zx4mZz) 
3. Is Z; a wide sense stationary process? 


Exercise: 


Problem: Proakis and Salehi, problem 3.63, parts 1, 3, and 4 


Exercise: 


Problem: Proakis and Salehi, problem 3.54 


Exercise: 


Problem: Proakis and Salehi, problem 3.62 


Exercises on Systems and Density 
Exercise: 


Problem: Consider the following system 


Modulator 


Zr 
Threshold 
Sample at T 


and dump 


Assume that NV; is a white Gaussian process with zero mean and 
spectral height ace 


If bis "0" then X, = Apr(r) and if bis "1" then X, = (—A)pr(r) 


Lot 0a 
where pr(T) = 0 otherwise 


ProH1\—] Prob =0| 172; 


. Suppose 


1. Find the probability density function Zr when bit "0" is 
transmitted and also when bit "1" is transmitted. Refer to these 
two densities as fz, 4, (z) and fz,,4, (z), where Hp denotes the 
hypothesis that bit "0" is transmitted and H, denotes the 
hypothesis that bit "1" is transmitted. 

2. Consider the ratio of the above two densities; i.e., 

Equation: 


f Z 

A(z) = ZT, ( ) 

£27, Hy (z) 
and its natural log In( A(z)). A reasonable scheme to decide 


which bit was actually transmitted is to compare In(A(z)) toa 
fixed threshold +. (A(z) is often referred to as the likelihood 


function and In(A(z)) as the log likelihood function). Given 
threshold ¥y is used to decide 6 = 0 when In(A(z)) > ¥ then find 
Pr 6 Ze b (note that we will say 6 = 1 when In(A(z)) < 4). 


3. Find a y that minimizes Pr b - b ; 
Exercise: 


Problem: Proakis and Salehi, problems 7.7, 7.17, and 7.19 


Exercise: 


Problem: Proakis and Salehi, problem 7.20, 7.28, and 7.23 


Homework 6 of Elec 430 


Homework set 6 of ELEC 430, Rice University, Department of Electrical 
and Computer Engineering 


Problem 1 
Consider the following modulation system 
Equation: 

S(t) = APr(t) —1 
and 
Equation: 


si(t) = (—(APr(t))) - 1 


dik Ofte 


for0 <t < T where Pr(t) = ‘4 ee 
otherwise 


Pp(v) 
The channel is ideal with Gaussian noise which is y(t) = 1 for all t, wide 


sense stationary with Ryw(r) = b?e~'! for all 7 € IR. Consider the 
following receiver structure 


S_(t)- 5, (0 


e a) Find the optimum value of the threshold for the system (e.g., that 
minimizes the P.). Assume that 7) = 7, 
e b) Find the error probability when this threshold is used. 


Problem 2 


Consider a PAM system where symbols aj, @2, a3, a4 are transmitted 
where a, € {2A, A, —A, — (2A)}. The transmitted signal is 
Equation: 


4 
X= S- ans(t — nT) 
n=l 


where s(t) is a rectangular pulse of duration T' and height of 1. Assume that 
we have a channel with impulse response g(t) which is a rectangular pulse 
of duration T' and height 1, with white Gaussian noise with Sy(f) = Ao 
for all f. 


e a) Draw atypical sample path (realization) of X; and of the received 
signal r; (do not forget to add a bit of noise!) 

e b) Assume that the receiver knows g(t). Design a matched filter for 
this transmission system. 

e c) Draw atypical sample path of Y;, the output of the matched filter 
(do not forget to add a bit of noise!) 

e d) Find an expression (or draw) u(nT’) where u(t) = s*g*h°P*(t). 


Problem 3 


Proakis and Salehi, problem 7.35 


Problem 4 


Proakis and Salehi, problem 7.39 


Homework 7 of Elec 430 
Exercise: 


Problem: 


Consider an On-Off Keying system wheres t A 7 cae 
for t Tands t for t TT. The channel is ideal 
AWGN with zero mean and spectral height ze 


1. Assume @ is known at the receiver. What is the average 
probability of bit-error using an optimum receiver? 

2. Assume that we estimate the receiver phase to be 9 andthat@ 0 
. Analyze the performance of the matched filter with the wrong 
phase, that is, examine P, as a function of the phase error. 

3. When does noncoherent become preferable? (You can find an 
expression for the P. of noncoherent receivers for OOK in your 
textbook.) That is, how big should the phase error be before you 
would switch to noncoherent? 


Exercise: 


Problem: Proakis and Salehi, Problems 9.4 and 9.14 
Exercise: 


Problem: 


A coherent phase-shift keyed system operating over an AWGN 
channel with two sided power spectral density * uses 

s t Apr t wt @ ands ¢t Apr t wt 6 
where 2 2 6; “ are constants and that 

fel with w, 1a ee 


1. Suppose 8 and @ are known constants and that the optimum 
receiver uses filters matched to s ¢ ands t .What are the 
values of P. and P. ? 


2. Suppose 8 and @ are unknown constants and that the receiver 
filters are matched tos t¢ Apr t wt and 
s t Apr t w-t am and the threshold is zero. 


Note: Use a correlation receiver structure. 


What are P. and P. now? What are the minimum values of P. 
and P, (asa function of 8 and@ )? 


Homework 8 of Elec 430 
Exercise: 


Problem: Proakis and Salehi, Problems 9.15 and 9.16 


Exercise: 


Problem: Proakis and Salehi, Problem 9.21 


Exercise: 


Problem: Proakis and Salehi, Problems 4.1, 4.2, and 4.3 


Exercise: 


Problem: Proakis and Salehi, Problems 4.5 and 4.6 


Homework 9 of Elec 430 
Exercise: 


Problem: Proakis and Salehi, Problems 4.22 and 4.28 


Exercise: 


Problem: Proakis and Salehi, Problems 4.21 and 4.25 


Exercise: 


Problem: Proakis and Salehi, Problems 10.1 and 10.6 


Exercise: 


Problem: Proakis and Salehi, Problems 10.8 and 10.9 
Exercise: 


Problem: 


For this problem of the homework, please either make up a problem 
relevant to chapters 6 or 7 of the notes or find one from your text book 
or other books on Digital Communication, state the problem clearly 
and carefully and then solve. 


Note:If you would like to choose one from your textbook, please 
reserve your problem on the white board in my office. (You may not 
pick a problem that has already been reserved.) 


Please write the problem and its solution on separate pieces of paper so 
that I can easily reproduce and distribute them to others in the class. 


